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We propose and investigate a unifying class of sparse random graph models, based on a hidden 
coloring of edge-vertex incidences, extending an existing approach, random graphs with a given de- 
gree distribution, in a way that admits a nontrivial correlation structure in the resulting graphs. The 
approach unifies a number of existing random graph ensembles within a common general formalism, 
and allows for the analytic calculation of observable graph characteristics. In particular, generating 
function techniques are used to derive the size distribution of connected components (clusters) as 
well as the location of the percolation threshold where a giant component appears. 



PACS numbers: 02.50.-r, 64.60.-i, 89.75.Fb 
Introduction. 

There is a growing interest in complex networks, in 
the physics community as well as in other sciences, 
partly due to an increased availability of data on 
real- world networks. This is reflected in a rapidly 
increasing number of models of random graph s jlj, |2 , 
03- Dl 01 and dynamical random graphs |q.M.lsl.l9l.llC|. 
with varying degrees of generality. 

This multitude of models calls for a unifying for- 
malism, including more specific models as special 
cases, while allowing for the calculation of observ- 
able characteristics that can be compared to those of 
real networks. Dynamical models are interesting in 
their own right, but the dynamics is seldom directly 
observable in real- world networks, and we will focus 
on static ensembles of random graphs, irrespective 
of whether they result from a dynamical process or 
not. 

Specifically, we will consider models of simple, 
undirected graphs that are sparse (the edge count 
grows linearly with the node count N) and truly ran- 
dom (having no underlying regular structure). The 
classic random graph in its sparse version is of this 

type HEJII2, where each of thc N ( N ~ l )l 2 P^- 
siblc edges is independently and randomly realized 

with a fixed probability p = c/N . It has a Poissonian 
asymptotic degree (connectivity) distribution with 
average c, and a percolation threshold at c = 1. It 
fails, however, to describe most real- world networks. 

Instead we turn to two of the more general ap- 
proaches, based on slightly different philosophies. 
One, to be referred to as DRG (Degree-driven ran- 
dom graphs), amounts to chosing a random mem- 
ber from the set of simple labelled graphs with a 
given arbitrary degree distribution |2, LU& uM ■ The 
other is Inhomogeneous random graphs IRG, 
where the classic model is generalized by randomly 
coloring vertices according to a color distribution 
{r.j}, and realizing edges independently with color- 
dependent probabilities Cij/N. Both yield analyti- 



cally tractable models displaying well-defined perco- 
lation thresholds and degree distributions, both in- 
clude a number of more specific models - and both 
have limitations: DRG fails to produce non-trivial 
edge correlations, as seen in the factorization of the 
combined degree distribution of connected vertex 
pairs 0; in IRG, the resulting degree distribution 
is limited to a mix of Poissonians |4| . 

These approaches are not unrelated: The restric- 
tion of DRG to degree distributions in the form of a 
Poissonian mix is in fact asymptotically equivalent 
to the restriction of IRG to a rank-one c matrix, 
Cab = C a Cb (exhibiting DRG's lack of correlations); 
this common subset contains the classic model Q. 

Basic Idea. 

By combining the philosophies of DRG and IRG, a 
more general class of analytically tractable sparse 
random graph models can be constructed. This uni- 
fying approach, to be referred to as CDRG (for Col- 
ored DRG), contains IRG and DRG as particular 
subsets, and is defined as a direct extension of DRG 
by assigning a hidden color to each vertex connec- 
tion (a half-edge, or stub) . As a result each edge will 
be associated with a pair of colors, one for each end- 
point. We then consider a given distribution {p m } 
of colored degrees m = (mi . . . uik), where for each 
vertex its number nik of stubs of each color k is ac- 
counted for, and allow the edge distribution to be 
color-sensitive by specifying also the distribution of 
edge color pairs. The resulting ensemble of stub- 
colored graphs yields, if the coloring is considered 
unobservable, a well-defined graph ensemble. Thc 
coloring thus can be thought of as a set of hidden 
variables, the purpose of which is to induce correla- 
tions in the resulting graphs. 

Below, we will discuss the definition and imple- 
mentation of CDRG models, derive the asymptotic 
cluster size distribution yielding equations for the 
percolation threshold, and identify the subsets cor- 
responding to DRG (trivial) and IRG (less trivial). 
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Asymptotic Model Specification. 

A particular asymptotic CDRG model is defined by 

specifying: 

• a definite color space, say {1,2,..., K}; 

• an asymptotic colored degree distribution 
(CDD), p m , defining the relative frequencies 
of vertices with different colored degrees m = 
(mi, . . . , 77i k), where TO a is the number of a- 
colored stubs of the vertex. We will assume 
here that all its moments, (m a ) = ^2 m p m m a , 
(m a mb), etc., are defined; 

• a symmetric, non-negative K x K color pref- 
erence matrix T, controling the relative abun- 
dance, ~ (m a ) T a b (mi,) , of edges between dif- 
ferent color pairs a,b. It must satisfy 

K 

J2 T ab(m b ) = l. (1) 

6=1 

Note that the total degree of a vertex is simply the 
sum of its colored degree components; the usual de- 
gree distribution is thus also fixed, and amounts to 

Truncation to Finite N . 

We want to implement such an asymptotic model 
with a specific N. This can be done e.g. by 
transforming the CDD into a definite colored de- 
gree sequence, as described by the number of vertices 
N m w Np m with colored degree m, subject to ob- 
vious constraints such as to < N, J2 m N m = N, 
and mN m is even. Similarly, the matrix T is 
used to determine the number of edges with color- 
pair ah as n a b m N (ra a ) T a b (mb) ■ Note that each 
afe-edge is counted twice, as ab and as ba, so the 
diagonal elements, 77 aa , must be even. The num- 
ber of edge endpoints (butts) with color a becomes 
n a = J2b n ab ~ N (m a ) J^b T ab (m b ) , and care must 
be taken that this matches the corresponding num- 
ber of stubs, J2 m m aN m ~ N (m a ) - thus the con- 
straint ||TJ on T. 

This yields a pool of vertices with definite col- 
ored degrees and a pool of edges with definite color 
pairs, all to be considered distinguishable. The set 
of distinct ways to combine these into a simple graph 
with color-matching between butts and stubs defines 
a set of colored graphs. By drawing a random mem- 
ber from this set and neglecting the coloring, the 
desired truncated CDRG ensemble results. 

Implementation in Practice. 

When it comes to the practical task of generating 
random graphs from this ensemble, the tricky step 
is that of picking a random member from the set of 



colored graphs consistent with definite iVmand n a b- 
A random stub-pairing method for DRG |2( can be 
extended to the case of colored stubs as follows. 

1. For each color a, make a complete random as- 
signment between the 77, a butts of color a and 
the n a matching stubs, to determine which 
butt should attach to which stub. 

2. While the resulting graph is not simple, repeat 
step^ 

Alternatively, the implementation could be done 
in a fully stochastic manner, where an extra ini- 
tial step is to draw N colored degrees indepen- 
dently from p m , and a pool of edges from q a b = 
(m a ) T a b (mb) I J2 C ( m <=)' subject to matching counts 
of stubs and butts of each color. In the thermody- 
namic limit, the result would be equivalent. Such 
a method would be more in line with the identifi- 
cation of CDRG with the Feynman graphs of zero- 
dimensional multi-component field theories, in anal- 
ogy to the the relation between DRG models and 
zero-dimensional scalar field theories 

Of course, either generation method is feasible 
only if the probability of obtaining a simple graph 
in each pairing attempt is not too small. This prob- 
ability is asymptotically calculable. 

Pairing Efficiency. 

A completely random pairing without the restriction 
that the resulting graph be simple yields an ensem- 
ble of multigraphs, i.e. possibly non-simple graphs 
where loops (cycles of length 1) and/or multiple edges 
are allowed. The efficiency of the above method de- 
pends on the probability to obtain a simple graph, 
which in turn depends on the abundance of loops and 
multiple edges. In a sparse graph, the probability for 
an edge between a given pair of nodes scales as 1/N, 
so we expect a finite number both of double edges 
(a factor of N 2 for the choice of a node-pair, and 
1/N 2 for two edges), and of loops (N for the choice 
of node, and 1/N for the edge making a loop). 

In fact, we can compute the asymptotically ex- 
pected number of loops and double edges in a ran- 
dom pairing to leading order: 

Loops: For a single vertex with colored degree m, the 
probability that two of its stubs will be connected 
is given by J2ab( m a m b ~ TO a f> afc )T ab /2iV. Averaging 
over m and summing over the node choice yields the 
expected number of loops as a — J2 ab MabTab/2, i-e. 

a = -Tr(TM), (2) 

where M = {M a b} stands for the matrix of moments 
(m a m b - m a 5 a b). 
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Double edges Similarly, for an arbitrary pair of 
nodes with colored degrees m,m', the probabil- 
ity of a double edge asymptotically amounts to 
J2abcd( m a m b-ma5ab){m' c m' d ~m' c d c d)T ac T bd /(2N 2 ). 
Averaging over m, m' and summing over the choice 
of node pair yields the expected number of double 
edges as (3 = J2 a bcd M ab M cd T ac T bd /^ i.e. 

j3 = -Tr(TM) 2 , (3) 

while triple edges etc. can be neglected altogether. 

In a similar way, the asymptotically expected 
number of more general small subgraphs can be 
computed, which in particular enables the computa- 
tion of the expectation of higher powers of the loop 
and double edge counts, resulting in the two counts 
asymptotically behaving as independent Poissonian 
random variables. Hence, the probability of obtain- 
ing a simple graph in the random pairing can be 
estimated as 

Prob(simple) w exp(— a — (3). (4) 

As a result, an average of ~ exp(a + 0) pairing at- 
tempts is needed, rendering the method feasible for 
reasonably small a + /?; in other cases an alternative 
generation method will have to be employed, such as 
starting from an arbitrary colored graph consistent 
with N m , n ab and applying a colored extension of 
a degree-preserving random rewiring algorithm sug- 
gested for DRG [l6(. 

Connected Component Statistics. 
The size-distribution of the connected components 
(clusters) of a random graph can be probed by choos- 
ing an initial vertex at random and recursively fol- 
lowing edges to new neighbors |[l4§. The sparsity 
of edges forces a finite set of revealed vertices to 
form a tree in the thermodynamic limit, since cross- 
linking is suppressed by factors of 1/JV. Hence, loops 
and double edges can be neglected to leading or- 
der, and the random color-matched pairing between 
stubs and butts reduces to a random branching pro- 
cess (branched polymer) based on the rules: (i) an 
edge emanating from a stub of color a ends in a stub 
of color b with probability T ab (mb}] (ii) given the 
color 6 of a stub, it belongs to a vertex with colored 
degree m with probability m b p m / {m b ). 

The asymptotic random branching process is con- 
veniently described in terms of a generating func- 
tion g(z) — J2 n PnZ n for the probability P n that 
the connected component being revealed consists of 
n vertices. g(z) can be expressed in terms of the 
corresponding generating functions h(z) = {h a (z)} 
for the number of nodes in a branch starting from a 
stub of color a. g(z) and h(z) satisfy the recursive 



relations 

g(z) = z^H^)"'" =zH(h(z)) (5a) 

in a 

h a (z) = z^T afc ^p m ™ b []^(z) m -^ 

b m c 

= zJ2Tabd b H(h(z)), (5b) 

b 

where H(x) = £ m p m x m = J2 m P^U a x T a is thc 
multivariate generating function for the CDD, while 
db stands for the derivative with respect to the 6th 
argument of H . Eqs. (J5J can be derived as follows. 
\5a)) \ The explicit factor of z accounts for the initial 
vertex, while the remainder consists in an average 
over the colored degree m of the initial vertex, of a 
factor h a (z) for each stub of color a, accounting for 
the contribution of the branch starting in that stub. 
$5tyl : Starting from a stub of color a, the asymptotic 
probability that the other end of the attached edge 
has color b and is connected to a vertex having col- 
ored degree m is given by T ab p m mh\ include a factor 
z for that vertex, and a factor h c {z) for each branch 
reached via one of its remaining (m c — 5 c b) stubs of 
color c. 

Percolation Threshold. 

Of particular interest is the value of g for z = 1: 
naively we expect g(l) = h a (l) = 1, expressing the 
normalization of probability. Indeed, this defines a 
fixed point of the recurrences JHJ, which however 
may be unstable. The stability can be analyzed by 
linearization of eq. I|5bj) around h(l) = 1, yielding 
the Jacobian matrix J defined by 

J a b = ^2T ac 8 e d b H{h)\ b ^ 1 , (6) 

C 

which can be written as J = TM (c.f. eqs. pi3)1. 

The point is that if an eigenvalue of J exceeds 
1, the naive fixed point h(l) = 1 turns unstable, 
signalling supercriticality of the branching process. 
In such a case another fixed point will appear, and 
take over as a stable solution with h a (l) < 1 yield- 
ing g(l) < 1. Analogous phenomenona occur in the 
classic model as well as in IRG and DRG; the asso- 
ciated probability deficit 1 — g(l) is interpreted as 
the probability of hitting a giant component asymp- 
totically containing a finite fraction 1 — g(l) of the 
vertices. This corresponds to a percolating phase; 
the percolation threshold is defined by the largest 
eigenvalue of TM being precisely 1. 

Inclusion of other models. 

With a single color, K = 1, CDRG trivially reduces 
to DRG, where a model is based on a given degree 
distribution {p m }, while the preference matrix T re- 
duces to a number, which by virtue of the constraint 
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JTJ must equal (m) 1 . Equations JSJ) reduce to the 
corresponding DRG equations, 



h(z) 



zH(h(z)), 
H'(h(z)) 
H'(l) ' 



(7a) 
(7b) 



with H(x) = ^2 m PmX m generating p m . The perco- 
lating phase is defined by J = (m(m — 1)) / (m) > 1, 
yielding the well-known (m(m — 2)} > 14]. 

The relation to IRG is less trivial: Assume the 
CDD to be in the form of a multi-Poissonian mix, 
i.e. iJ(x) = Y,i r i ex P E a c ia{x a - !))■ Define 

gi(z) = zexp |^C M (/i a (z) - , (8a) 



in terms of which equations (jSJ) reduce to 

■9( z ) = ^ngiiz), 



:exp ^r,-cy {g 3 {z) - 1) 



(8b) 



(9a) 



(9b) 



Eqs. 10 exactly reproduce the result for g(z) in an 
IRG model with r; taken as the probability of vertex 
color i and Cij/N the probability of an edge between 
a pair of vertices with colors i,j |4j. 

Conversely, given an IRG model in terms of 
{n,Cjj}, one can always find {Ci a ,T a b} satisfying 
eq. (|HE1> such that J2 a Cia = J2j c ij r j- 

It follows that CDRG contains also ensembles re- 
sulting from dynamical models such as Randomly 
grown graphs hj and Dynamical random graphs 
with memory |8j, that can be described in IRG [4j, 
albeit at the cost of infinitely many colors. 

Concluding Remarks. 

The above analysis shows that DRG and IRG can be 
unified into a more general class of random graph 
models, defined in terms of a hidden coloring of 
stubs and butts, with specified distributions of color- 
extended vertex degrees as well of edge colorpairs. 
The purpose of the hidden coloring is to enable 
a nontrivial correlation structure in the resulting 
graphs. 

This approach yields a general formalism for a 
large class of analytically tractable models on a given 



degree distribution, where local and global proper- 
ties of the resulting graphs are calculable in the ther- 
modynamic limit. Such a formalism also defines a 
suitable target for statistical model inference based 
on observed structural properties. 

We have here assumed all moments of the degree 
distribution to exist, excluding e.g. power behav- 
ior. The approach will be extended also to models 
with "fat tails". These are sensitive to the precise 
truncation method and will be treated elsewhere. 

A more detailed investigation, addressing aspects 
and properties of CDRG models not treated in this 
letter, is in progress and will the subject of a forth- 
coming article, as will the extension to directed 
graphs and to degree distributions with power tails. 

This work was in part supported by the Swedish 
Foundation for Strategic Research. 
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